MusicSarreV3, Main, Exploration, bibRecord, 000971

On the Road to High-Quality POS-Tagging

Identifieur interne : 000971 ( Main/Exploration ); précédent : 000970; suivant : 000972

On the Road to High-Quality POS-Tagging

Auteurs : Stefan Klatt [Autriche] ; Karel Oliva [Autriche]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.

RBID : ISTEX:96C50054BE9FF5B6C161E8EC182C63333573ACCB

English descriptors

Teeft :
- Analysis systems, Annotation, Better precision rate, Better results, Central bank, Computational linguistics, Corpus, Corpus frequency, Correct assignment, Correct reading, Correct readings, Corrua readings, Current tokenizers, Decision window, Entire article, Entity recognizer, Error rate, Error rates, External sector, Foreign material, Foreign words, Inflationary expectations, Input problems, Klatt, Length tokens, Lexical, Lexical analysis, Linguistic point, Linguistic rule, Linguistic rules, Linguistic tagger, Linguistic taggers, Many errors, Module, Module uwrb, Modules uwcb, Modules uwrb, More readings, Morphological analysis, Multiword units, Negra, Negra corpus, Next section, Nite verb, Noun, Oliva, Original sense, Other words, Output problems, Parameterizable threshold, Pipeline architecture, Precision rate, Processing architecture, Proper nouns, Quotation marks, Relative pronoun, Second case, Single quotation mark, Statistical tagger, Statistical taggers, Such tokens, Tagger, Test corpus, Text material, Tokenization, Training material, Ungrammatical readings, Unknown tokens, Unknown word, Unknown words, Uwrb, Verb reading, Whole sentential context, Wieder einmal, Word vergessen.

Abstract

Abstract: In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.

Url:

https://api.istex.fr/document/96C50054BE9FF5B6C161E8EC182C63333573ACCB/fulltext/pdf

DOI: 10.1007/11551263_31

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000F61
to stream Istex, to step Curation: 000E84
to stream Istex, to step Checkpoint: 000770
to stream Main, to step Merge: 000970
to stream Main, to step Curation: 000971

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">On the Road to High-Quality POS-Tagging</title>
<author><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
</author>
<author><name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:96C50054BE9FF5B6C161E8EC182C63333573ACCB</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11551263_31</idno>
<idno type="url">https://api.istex.fr/document/96C50054BE9FF5B6C161E8EC182C63333573ACCB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000F61</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000F61</idno>
<idno type="wicri:Area/Istex/Curation">000E84</idno>
<idno type="wicri:Area/Istex/Checkpoint">000770</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000770</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Klatt S:on:the:road</idno>
<idno type="wicri:Area/Main/Merge">000970</idno>
<idno type="wicri:Area/Main/Curation">000971</idno>
<idno type="wicri:Area/Main/Exploration">000971</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">On the Road to High-Quality POS-Tagging</title>
<author><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
<affiliation wicri:level="3"><country xml:lang="fr">Autriche</country>
<wicri:regionArea>Austrian Research Institute for Artificial Intelligence, Freyung 6/6, A-1010, Vienna</wicri:regionArea>
<placeName><settlement type="city">Vienne (Autriche)</settlement>
<region nuts="2" type="province">Vienne (Autriche)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Autriche</country>
</affiliation>
</author>
<author><name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
<affiliation wicri:level="3"><country xml:lang="fr">Autriche</country>
<wicri:regionArea>Austrian Research Institute for Artificial Intelligence, Freyung 6/6, A-1010, Vienna</wicri:regionArea>
<placeName><settlement type="city">Vienne (Autriche)</settlement>
<region nuts="2" type="province">Vienne (Autriche)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Autriche</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Analysis systems</term>
<term>Annotation</term>
<term>Better precision rate</term>
<term>Better results</term>
<term>Central bank</term>
<term>Computational linguistics</term>
<term>Corpus</term>
<term>Corpus frequency</term>
<term>Correct assignment</term>
<term>Correct reading</term>
<term>Correct readings</term>
<term>Corrua readings</term>
<term>Current tokenizers</term>
<term>Decision window</term>
<term>Entire article</term>
<term>Entity recognizer</term>
<term>Error rate</term>
<term>Error rates</term>
<term>External sector</term>
<term>Foreign material</term>
<term>Foreign words</term>
<term>Inflationary expectations</term>
<term>Input problems</term>
<term>Klatt</term>
<term>Length tokens</term>
<term>Lexical</term>
<term>Lexical analysis</term>
<term>Linguistic point</term>
<term>Linguistic rule</term>
<term>Linguistic rules</term>
<term>Linguistic tagger</term>
<term>Linguistic taggers</term>
<term>Many errors</term>
<term>Module</term>
<term>Module uwrb</term>
<term>Modules uwcb</term>
<term>Modules uwrb</term>
<term>More readings</term>
<term>Morphological analysis</term>
<term>Multiword units</term>
<term>Negra</term>
<term>Negra corpus</term>
<term>Next section</term>
<term>Nite verb</term>
<term>Noun</term>
<term>Oliva</term>
<term>Original sense</term>
<term>Other words</term>
<term>Output problems</term>
<term>Parameterizable threshold</term>
<term>Pipeline architecture</term>
<term>Precision rate</term>
<term>Processing architecture</term>
<term>Proper nouns</term>
<term>Quotation marks</term>
<term>Relative pronoun</term>
<term>Second case</term>
<term>Single quotation mark</term>
<term>Statistical tagger</term>
<term>Statistical taggers</term>
<term>Such tokens</term>
<term>Tagger</term>
<term>Test corpus</term>
<term>Text material</term>
<term>Tokenization</term>
<term>Training material</term>
<term>Ungrammatical readings</term>
<term>Unknown tokens</term>
<term>Unknown word</term>
<term>Unknown words</term>
<term>Uwrb</term>
<term>Verb reading</term>
<term>Whole sentential context</term>
<term>Wieder einmal</term>
<term>Word vergessen</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.</div>
</front>
</TEI>
<affiliations><list><country><li>Autriche</li>
</country>
<region><li>Vienne (Autriche)</li>
</region>
<settlement><li>Vienne (Autriche)</li>
</settlement>
</list>
<tree><country name="Autriche"><region name="Vienne (Autriche)"><name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
</region>
<name sortKey="Klatt, Stefan" sort="Klatt, Stefan" uniqKey="Klatt S" first="Stefan" last="Klatt">Stefan Klatt</name>
<name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
<name sortKey="Oliva, Karel" sort="Oliva, Karel" uniqKey="Oliva K" first="Karel" last="Oliva">Karel Oliva</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000971 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000971 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:96C50054BE9FF5B6C161E8EC182C63333573ACCB
   |texte=   On the Road to High-Quality POS-Tagging
}}

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024

	Serveur d'exploration sur la musique en Sarre
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la musique en Sarre

On the Road to High-Quality POS-Tagging

On the Road to High-Quality POS-Tagging

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri